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Abstract: - Exploratory factor analysis is a complex and multivariate statistical technique commonly employed 
in information system, social science, education and psychology. This paper intends to provide a simplified 
collection of information for researchers and practitioners undertaking exploratory factor analysis (EFA) and to 
make decisions about best practice in EFA. Particularly, the objective of the paper is to provide practical and 
theoretical information on decision making of sample size, extraction, number of factors to retain and rotational 
methods. 
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1 Introduction 

Factor analysis is a significant instrument which is 
utilized in development, refinement, and evaluation 
of tests, scales, and measures (Williams, Brown et 
al. 2010). Exploratory factor analysis (EFA) is 
widely used and broadly applied statistical approach 
in information system, social science, education and 
psychology. Recently, exploratory factor analysis 
was applied for a wide range of applications, 
including finding relationships between 
socioeconomic, land use, activity participation 
variables and travel patterns (Pitombo, E. Kawamoto 
et al. 2011), developing an instrument for the 
evaluation (Lovett and Zeiss 2002), assessment of 
services quality dimensions of Internet retailing 
(Yang, Peterson et al. 2003), e-commerce service 
quality (Cox and Dale 2001), evaluation of animal 
movement (Brillinger, Preisler et al. 2004), Intranet 
adoption (Tang 2000), assessing the motivation 
(Morris 2001), survey instrument to examine 
consumer adoption of broadband (Dwivedi, 
Choudrie et al. 2006 ), and determining what types 
of services should be offered to college students 
(Majors and Sedlacek 2001). 

A survey in PsycINFO yielded over 1700 studies 
that used some form of EFA. Over fifty percent used 
the varimax rotation for principal components 
analysis as the approach used for data analysis, and 
also the majority of the researches used the Kaiser 
criterion (all factors with eigenvalues greater than 
one) as a method for deciding the number of 
constructs to be retained for rotation although it will 


not always yield the best results for a particular data 
set (Costello and Osborne 2005). 

Thus, the goal of this study is to discuss about 
exploratory factor analysis protocol and provide 
practical information for researcher and 
practitioners. Particularly, below points will be 
discussed: 

1) An overview of exploratory factor analysis, 

2) Sample size, 

3) Factor extraction methods, 

4) Number of factors to retain techniques, 

5) Types of rotational methods 


2 Factor Analysis 

Factor analysis (FA) has origins dating back 100 
years through the work of Pearson and Spearman 
(Spearman 1904). Factor analysis as a multivariate 
statistical procedure, is commonly utilized in the 
fields of information system, psychology, commerce 
and education and is considered the approach of 
choice for interpreting self-reporting survey (Byrant, 
Yarnold et al. 1999). 

FA reduces a large number of variables (factors) 
into a smaller set. Furthermore, it establishes 
underlying dimensions between measured factors 
and latent constructs, thereby allowing the 
formation and refinement of theory. Moreover, it 
provides construct validity evidence of self- 
reporting scales (Gorsuch 1983; Hair, Anderson et 
al. 1995a; Tabachnick and Fidell 2001; Thompson 
2004). 
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3 Types of Factor Analysis 

Factor analysis is divided to two main categories 
namely; Exploratory Factor Analysis (EFA), and 
Confirmatory Factor Analysis (CFA) (Williams, 
Brown et al. 2010). While the researcher has no 
expectations of the number or nature of the factors, 
EFA is used. As the title suggests, it allows the 
investigator to explore the main variables to create a 
theory, or model from a relatively large set of latent 
dimensions often represented by a set of items (Pett, 
Fackey et al. 2003; Swisher, Beckstead et al. 2004; 
Thompson 2004; Henson and Roberts 2006). 
Whereas, CFA as a form of structural equation 
modeling (SEM), is applied to test the proposed 
theory by researcher, or model. CFA, in contrast to 
EFA, has assumptions and expectations based on 
priori model and theory about the number of 
constructs, and which construct theories or models 
best fit (Williams, Brown et al. 2010). 

Although both EFA and CFA methods try to 
account for as much variance as possible in a set of 
observed variables with a smaller set of latent 
variables, factors, or components, EFA is 
principally suitable for scale development and 
applied when there is little theoretical basis for 
specifying a priori the number and patterns of 
common factors (Hurley, Scandura et al. ; Hayton, 
Allen et al. 2004). (Tabachnick and Fidell 2001) 
also address the limitations of EFA, noting that 
“decisions about number of factors and rotational 
scheme are based on pragmatic rather that 
theoretical criteria”. 


4 Exploratory Factor Analyses 

Despite exploratory factor analysis being a 
apparently complex statistical method, the approach 
taken in the analysis is sequential and linear, 
involving many options (Thompson 2004). 
Objectives of Exploratory Factor Analysis (Pett, 
Fackey et al. 2003; Thompson 2004) are: 

■ Reduction of number of factors (variables) 

■ Assessment of multicollinearity among factors 
which are correlated 

■ Unidimensionality of constructs evaluation 
and detection 

■ Evaluation of construct validity in a survey 

■ Examination of factors (variables) relationship 
or structure 

■ Development of theoretical constructs 

■ Prove proposed theories 

According to (Fabrigar, Wegener et al. 1999), there 
are five methodological issues that researchers 
should consider for utilizing EFA. First, researcher 


should determine if the EFA is the most appropriate 
statistical method to achieve the purpose of the 
study. Second, the variables of the study, sample 
size and nature should be selected. Third, the 
extraction procedure should be chosen and then 
determine the method to decide the number of 
factors to retain. Fifth, researcher need to select the 
rotation method to yield a final interpretable 
solution. Failure to make a proper decision about 
one or more of above mentioned methodological 
issues may lead to erroneous results and limit the 
utility of the EFA (Hogarty, Kromrey et al. 2004). 
Figure one, shows the steps toward implementing 
exploratory factor analysis. 
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Figure 1: Exploratory Factor Analysis 
Implementation Steps 

4.1 Sample Size 

Although sample size is a significant issue FA, there 
are different ideas and several guiding rules of 
thumb in the literature (Gorsuch 1983; Tabachnick 
and Fidell 2001; Hogarty, Hines et al. 2005). This 
lack of agreement was noted by (Hogarty, Hines et 
al. 2005) who stated that these “disparate 
recommendations have not served researchers well”. 
General guides include (Tabachnick and Fidell 
2001)’s rule of thumb that suggests having at least 
300 cases are needed for factor analysis. (Hair, 
Anderson et al. 1995a) suggested that sample sizes 
should be 100 or greater. (Comrey 1973) stated in 
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his guide to sample sizes: 100 as poor, 200 as fair, 
300 as good, 500 as very good, and 1000 or more as 
excellent. (MacCallum, Widaman et al. 1999) 
illustrated that when communalities are high 
(greater than .60) and each factor is defined by 
several items, sample sizes can actually be relatively 
small” (Henson and Roberts 2006). Others studies 
such as (Guadagnoli and Velicer 1988) stated that 
solutions with correlation coefficients >0.80 require 
smaller sample sizes while (Sapnas and Zeller 2002) 
arrgued that even 50 cases may be adequate for 
factor analysis. 

Previous studies revealed that nature of data will 
determine the adequacy of sample size (Fabrigar, 
Wegener et al. 1999; MacCallum, Widaman et al. 
1999). Commonly, the stronger the data, the smaller 
the sample can be for an accurate analysis. “Strong 
data” in factor analysis means uniformly high 
communalities without cross loadings, plus several 
variables loading strongly on each factor (Costello 
and Osborne 2005). 

(Costello and Osborne 2005) indicate that if the 
following problems emerge in the data, a larger 
sample can help determine whether or not the factor 
structure and individual items are valid: 

1) Item communalities are 0.8 or greater which is 
known as a “high” (Velicer and Fava 1998) 
although it is unlikely to happen in real data. More 
common magnitudes are 0.40 to 0.70 and known as 
low to moderate communalities. If the item would 
not be related to other items or additional construct 
need to be explored, then the item communality will 
be less than 0.40. 

2) (Tabachnick and Fidell 2001) mentioned that 
instrument items should load at 0.32, which equates 
to approximately 10% overlapping variance with the 
other items in that factor. A “crossloading” item is 
an item that loads at 0.32 or higher on two or more 
factors. If there are several crossloaders, the items 
may be poorly written or the a priori factor structure 
could be flawed (Costello and Osborne 2005). 
While other researchers (Comery and Lee 1992; 
Laura J. Burton and Stephanie M. Mazerolle 2011) 
emphasized 0.50 or higher as a good rule of thumb 
for the minimum loading of an item with no cross 
loadings. 

3) A construct with fewer than three items is 
generally weak and unstable; five or more strongly 
loading items (0.50 or better) are desirable and 
indicate a solid factor (Costello and Osborne 2005). 


4.1.1 Correlation matrix 

In EFA, a correlation matrix as one of the most 
popular statistical technique (Henson and Roberts 


2006) is used to determine the relationships between 
variables. (Tabachnick and Fidell 2001) 
recommended inspecting the correlation matrix for 
correlation coefficients over 0.30. 

In other words, loading of 0.3, indicates that the 
factors account for approximately 30% relationship 
within the data, or in a practical sense, it would 
indicate that a third of the variables share too much 
variance, and hence becomes impractical to 
determine if the variables are correlated with each 
other or the dependent variable (multicollinearity) 
(Williams, Brown et al. 2010). 

(Hair, Anderson et al. 1995a) categorised the 
correlation loadings as 0.30 = minimal, 0.40 = 
important, and 0.50 = practically. If the correlations 
is less than 0.30, then it should be reconsidered if 
FA is proper approach to be used for the research 
(Hair, Anderson et al. 1995a; Tabachnick and Fidell 
2001). If the correlation matrix is an identity matrix 
(there is no relationship among the items)(Kraiser 
1958), EFA should not be applied . 


4.1.2 Kaiser-Meyer-Olkin (KMO) and 
Bartlett's Test 

Prior to the extraction of the constructs, there are 
some tests which must be conducted to examine the 
adequacy of the sample and the suitability of data 
for FA (Laura J. Burton and Stephanie M. Mazerolle 
2011). Sampling adequacy provides the researcher 
with information regarding the grouping of survey 
items. Grouping items into a set of interpretable 
factors can better explain the constructs under 
investigation. Measures of sampling adequacy 
evaluate how strongly an item is correlated with 
other items in the EFA correlation matrix (Laura J. 
Burton and Stephanie M. Mazerolle 2011). 

The sampling adequacy can be assessed by 
examining the Kaiser-Meyer-Olkin (KMO) (Kaiser 
1970). KMO is suggested when the cases to variable 
ratio are less than 1:5. It ranges from 0 to 1, while 
according to (Hair, Anderson et al. 1995a; 
Tabachnick and Fidell 2001), 0.50 considered 
suitable for FA. On the other hand, (Netemeyer, 
Bearden et al. 2003) stated that a KMO correlation 
above 0.60 - 0.70 is considered adequate for 
analyzing the EFA output. 

Bartlett's test of Sphericity (Bartlett 1950) provides 
a chi-square output that must be significant. It 
indicates the matrix is not an identity matrix and 
accordingly it should be significant (p<.05) for 
factor analysis to be suitable (Hair, Anderson et al. 
1995a; Tabachnick and Fidell 2001). 
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In brief, if the KMO indicates sample adequacy and 
Bartlett’s test of sphericity indicates the item 
correlation matrix is not an identity matrix, then 
researchers can move forward with the FA 
(Netemeyer, Bearden et al. 2003). 

4.2 Factor Extraction 

There are several ways to extract factors: principal 
components analysis (PCA), principal axis factoring 
(PAF), image factoring, maximum likelihood, alpha 
factoring, unweighted least squares, generalised 
least squares and canonical (Tabachnick and Fidell 
2001; Thompson 2004; Costello and Osborne 2005). 
However, principal components analysis and 
principal axis factoring are used most commonly in 
studies (Tabachnick and Fidell 2001; Thompson 
2004; Henson and Roberts 2006). The decision 
whether to use PCA and PAF is fiercely debated 
among analysts (Henson and Roberts 2006), 
although the practical differences between the two 
are often insignificant (Thompson 2004) and 
according to (Gorsuch 1983), when factors have 
high reliability or there are thirty or more factors, 
the is not significant differences. 

(Thompson 2004) stated that the reason why PCA is 
mostly used is that it is the default method in many 
statistical software. PCA is suggested to be used 
when no prior theoretical basis or model exists 
(Gorsuch 1983). Moreover, (Pett, Lackey et al. 
2003) recommended using PCA in establishing 
preliminary solutions in EFA. According to 
(Costello and Osborne 2005), factor analysis is 
preferable to principal components analysis which is 
only a data reduction approach. If researcher have 
initially developed an instrument with several items 
and is interested in reducing the number of items , 
then the PCA is useful (Netemeyer, Bearden et al. 
2003). 

It is computed without regard to any underlying 
structure caused by latent variables; components are 
calculated using all of the variance of the manifest 
variables, and all of that variance appears in the 
solution (Ford, MacCallum et al. 1986). When the 
factors are uncorrelated and communalities are 
moderate it can produce inflated values of variance 
accounted for by the components (McArdle 1990; 
Gorsuch 1997). 

Conversely, principle axis factoring is useful while 
researcher want to determine the underlying factors 
related to a set of items (Laura J. Burton and 
Stephanie M. Mazerolle 2011). 

On the other hand, (Fabrigar, Wegener et al. 1999) 
stated that if data are relatively normally distributed, 
maximum likelihood (ML) is the best choice 


because “it allows for the computation of a wide 
range of indexes of the goodness of fit of the model 
and permits statistical significance testing of factor 
loadings and correlations among factors and the 
computation of confidence intervals.” 

Overall, according to (Costello and Osborne 2005), 
maximum likelihood or principal axis factoring will 
give researcher the best results, depending on if 
data are generally normally-distributed or 
significantly non-normal, respectively. 


4.3 Factor Retention Methods 

After extraction phase, the researcher must decide 
how many constructs to retain for rotation. Factor 
retention is more important than other phases. 
(Hayton, Allen et al. 2004) point out three reasons 
why this decision is so important. First, because 
there is evidence of robustness across alternatives 
for these other decisions (Zwick and Velicer 1986). 
Second, exploratory factor analysis needs to balance 
parsimony with adequately representing underlying 
correlations therefore its utility depends on being 
able to differentiate major factors from minor ones 
(Fabrigar, Wegener et al. 1999). Also there is 
conceptual and empirical evidence that both 
underextraction and overextraction are substantial 
errors that affect results, although specifying too 
few is traditionally considered more severe. Both 
types of misspecifications have been empirically 
demonstrated to lead to poor factor-loading pattern 
reproduction and interpretation (Velicer, Eaton et al. 
2000; Hayton, Allen et al. 2004) and also they will 
effect on EFA efficiency and meaning (Ledesma 
and Valero-Mora 2007). 

A number of criteria are available to assist these 
decisions, but they do not always lead to the same or 
even similar results (Zwick and Velicer 1986; 
Thompson and Daniel 1996). Factor retentions 
methods are; Cumulative percent of variance 
extracted, Kaiser" s criteria (eigenvalue > 1 rule) 
(Kaiser 1960), Scree test (Cattell 1966) and Parallel 
Analysis (Horn 1965). (Hair, Anderson et al. 1995a) 
mentioned that the majority of factor analysts 
commonly use multiple criteria. 

4.3.1 Cumulative Percentage of Variance 

There is no agreement in cumulative percentage of 
variance (CPV) in the FA method, particularly in 
different research area (Henson and Roberts 2006). 
For instance, in the natural sciences, according to 
(Hair, Anderson et al. 1995a), factors should be 
stopped when at least 95% of the variance is 
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explained although in the humanities, the explained 
variance is generally as low as 50-60% (Hair, 
Anderson et al. 1995a; Pett, Lackey et al. 2003). 
There is evidence to suggest that the results of 
exploratory factor analysis are more accurate when 
each common factor is represented by multiple 
variables in the analysis (Williams, Brown et al. 
2010). In this regard, (MacCallum, Widaman et al. 
1999) posh that when EFA is performed on 
variables with low communalities substantial 
distortion in results may occur. 


4.3.2 K1 - Kaiser’s eigenvalue > 1 

According to the K1 - Kaiser’s (Kaiser 1960) 
method, only constructs which has the eigenvalues 
greater than one should be retained for 
interpretation. This approach may be the best known 
and most used in practice (Fabrigar, Wegener et al. 
1999) because of its theoretical basis and ease of use 
(Gorsuch 1983). 

Despite the widespread adoption and simplicity of 
this method, many researchers argued that it is 
problematic and inefficient. There are three 
problems using this method for factor retention 
(Fabrigar, Wegener et al. 1999; Hayton, Allen et al. 
2004). Firstly, this approach was proposed for the 
PCA case, eigenvalues of the correlation matrix 
with unities at the diagonal and it is not a valid rule 
in the EFA case, eigenvalues of the correlation 
matrix with communality estimates at the diagonal 
(Gorsuch 1983). Secondly, the rule is somewhat 
arbitrary in that it draws distinctions between factors 
with eigenvalues just above and just belowl 
(Fabrigar, Wegener et al. 1999; Hayton, Allen et al. 
2004). (Linn 1968) demonstrated that K1 
overestimated the correct number of factors by 66%. 
Lastly, this approach has demonstrated tendency to 
substantially overestimate the number of factors, 
and, in some cases, even underestimate them (Horn 
1965; Zwick and Velicer 1986). Kaiser himself 
reported that the number of components retained by 
K1 is commonly between 1/3, 1/5 or 1/6 the 
number of variables included in the correlation 
matrix (Zwick and Velicer 1986). A number of 
studies have mentioned that K1 is among the least 
accurate methods for selection of factor retention 
(Velicer and Jackson 1990; Fabrigar, Wegener et al. 
1999; Ledesma and Valero-Mora 2007). 


4.3.3 Scree Test 

Another popular used method for determining the 
number of factors to retain is Cattell’s Scree test 


(Cattell 1966) which involves the visual exploration 
of a graphical representation of the eigenvalues for 
breaks or discontinuities. 

The number of datapoints above the break (not 
including the point at which the break occurs) is the 
number of factors to retain. The logic behind this 
method is that this point divides the important or 
major factors from the minor or trivial factors 
(Ledesma and Valero-Mora 2007). 

As illustrated by (Gorsuch 1983; Tabachnick and 
Fidell 2001; Thompson 2004), interpreting Scree 
plots is subjective, requiring researcher judgment. 
Therefore, the number of factors to retain and 
results can be different (Zwick and Velicer 1986; 
Pett, Lackey et al. 2003). Although this 
disagreement and subjectiveness is reduced when 
sample size is large, N:p ratios are (>3:1) and 
communalities values are high (Linn 1968; Gorsuch 
1983; Pett, Lackey et al. 2003). 

Nonetheless, (Zwick and Velicer 1986) comparison 
concluded that the Scree test performed better than 
the K1 rule, although it was still correct only 57% of 
the time and in most inaccurate cases, the 
overestimate of factors has been found (Ledesma 
and Valero-Mora 2007) even though (Costello and 
Osborne 2005) noted that Scree test is the best 
choice for researchers. 


4.3.4 Minimum Average Partial 

(Velicer 1976) proposed the Minimum Average 
Partial (MAP), a method which calculates the 
average of squared partial correlations after each 
component is partialled out. When the minimum 
average squared partial correlation is reached, the 
residual matrix resembles an identity matrix and no 
further components are extracted (Hayton, Allen et 
al. 2004). 

Based on the nature of this method, the factor which 
has low loading will not be retained, thus there will 
be at least two variables with high loading for each 
retained factor (Zwick and Velicer 1986). 
According to this property of MAP, it is inaccurate 
because MAP method consistently underestimated 
the number of major components for cases that there 
is low factor loading or low number of variables per 
factor (Ledesma and Valero-Mora 2007). 

Although some researchers (Zwick and Velicer 
1986; Wood, Tataryn et al. 1996) argued that MAP 
has more ability to select the componants compare 
to CPV, K1 and Scree test. Furthermore, (Zwick and 
Velicer 1986) stated that MAP is correct in 84% of 
the time and is the second most accurate method for 
factor selection. 
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4.3.5 Parallel Analysis 

Parallel Analysis (PA) has been proposed by (Horn 
1965). PA compares the observed eigenvalues 
extracted from the correlation matrix to be analyzed 
with those obtained from uncorrelated normal 
variables. In PA method, the component will be 
considered important if the eigenvalue actual 
eigenvalues surpass random ordered eigenvalues 
(Hayton, Allen et al. 2004; Ledesma and Valero- 
Mora 2007). 

Various researchers point out that Parallel Analysis 
is the best method to determine how many factors to 
retain (Humphreys and Montanelli 1975; Zwick and 
Velicer 1986; Glorfeld 1995) (Thompson and 
Daniel 1996; Ledesma and Valero-Mora 2007). 
Besides, (Zwick and Velicer 1986) indicated that 
Parallel Analysis is correct 92% of the time and it 
will demonstrate the least variability and sensitivity 
to different factors. 

In brief, it can be concluded that PA is an 
appropriate method to decide the number of factors 
to retain in exploratory factor analysis although it is 
not widely utilized. 

4.4 Selection of Rotational Method 

An additional issue when researcher decide how 
many constructs will analyze the data is whether a 
variable might relate to more than one factor 
(Williams, Brown et al. 2010). In order to produce a 
more interpretable and simplified solution, rotation 
will help by maximizing high item loadings and 
minimizing low item loadings. Oblique and 
orthogonal rotations are two types of rotation 
technique. 

Oblique rotation is more accurate while data does 
not meet priori assumptions (Costello and Osborne 
2005). This method allocates the factors to correlate 
or in other words, producing constructs structures 
that are correlated. Quartimin, direct oblimin and 
promax are commonly available methods for 
oblique rotation. 

In contrast, orthogonal rotation produces factors that 
are uncorrelated. Orthogonal method has several 
options for rotation; quartimax, varimax, and 
equamax. (Costello and Osborne 2005) stated that 
orthogonal rotation produces more easily 
interpretable results and is slightly simpler than 
oblique rotation. 

Varimax rotation which was developed by 
(Thompson 2004) is the most common form of 
rotational methods for exploratory factor analysis 
and will often provide a simple structure. On the 
other hand, Fabrigar et al. (1999) stated that there is 


no widely preferred technique of oblique rotation 
and all techniques tend to produce similar outputs. 

4.5 Interpretation 

Interpretation is the process of examination to select 
variables which are attributable to a construct and 
allocating a name for that construct. The labeling of 
constructs is a theoretical, subjective and inductive 
process (Pett, Lackey et al. 2003). It is significant 
that labels of constructs reflect the theoretical and 
conceptual intent. 

For instance, a construct may includes four variables 
which all related to the user satisfaction thus the 
label “user satisfaction” will be assigned for that 
construct. (Henson and Roberts 2006) stated that in 
order to providing a meaningful interpretation, at 
least two or three variables must load on a factor. 


5 Conclusions 

Factor measurement, definition and instrument 
validity are vital to information system, social 
science, education and psychology. EFA is a 
complex multivariate statistical method involving 
many linear and sequential steps. The intention of 
this research was to provide the fundamental 
information about EFA with a stepwise and user- 
friendly guideline. 

The paper suggest a five stem guide for 
implementation of exploratory factor analysis which 
includes: (1) evaluation of sample size adequacy 
using correlation matrix, Kaiser-Meyer-Olkin 
(KMO) and Bartlett's Test techniques, (2) choosing 
factor extraction method such as principal 
components analysis, principal axis factoring, image 
factoring, maximum likelihood, alpha factoring, 
unweighted least squares, generalised least squares 
and canonical, (3) selecting factor retention methods 
using; cumulative percentage of variance, K1 - 
Kaiser’s , scree Test, minimum average partial 
approaches and parallel analysis, (4) selection of 
rotational method, whether orthogonal rotations or 
Oblique rotation and finally, (5) interpretation and 
labeling of factors. 
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